klotz: machine learning*

"Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

https://en.wikipedia.org/wiki/Machine_learning

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. A curated reading list for those starting to learn about Large Language Models (LLMs), covering foundational concepts, practical applications, and future trends, updated for 2026.
  2. This article explores the field of mechanistic interpretability, aiming to understand how large language models (LLMs) work internally by reverse-engineering their computations. It discusses techniques for identifying and analyzing the functions of individual neurons and circuits within these models, offering insights into their decision-making processes.
  3. This article details seven advanced feature engineering techniques using LLM embeddings to improve machine learning model performance. It covers techniques like dimensionality reduction, semantic similarity, clustering, and more.

    The article explores how to leverage LLM embeddings for advanced feature engineering in machine learning, going beyond simple similarity searches. It details seven techniques:

    1. **Embedding Arithmetic:** Performing mathematical operations (addition, subtraction) on embeddings to represent concepts like "positive sentiment - negative sentiment = overall sentiment".
    2. **Embedding Clustering:** Using clustering algorithms (like k-means) on embeddings to create categorical features representing groups of similar text.
    3. **Embedding Dimensionality Reduction:** Reducing the dimensionality of embeddings using techniques like PCA or UMAP to create more compact features while preserving important information.
    4. **Embedding as Input to Tree-Based Models:** Directly using embedding vectors as features in tree-based models like Random Forests or Gradient Boosting. The article highlights the importance of careful handling of high-dimensional data.
    5. **Embedding-Weighted Averaging:** Calculating weighted averages of embeddings based on relevance scores (e.g., TF-IDF) to create a single, representative embedding for a document.
    6. **Embedding Difference:** Calculating the difference between embeddings to capture changes or relationships between texts (e.g., before/after edits, question/answer pairs).
    7. **Embedding Concatenation:** Combining multiple embeddings (e.g., title and body of a document) to create a richer feature representation.
  4. Daggr is a new, open-source Python library for building AI workflows that connect Gradio apps, ML models, and custom functions. It automatically generates a visual canvas where you can inspect intermediate outputs, rerun individual steps, and manage state for complex pipelines.
  5. This post discusses the limitations of using cosine similarity for compatibility matching, specifically in the context of a dating app. The author found that high cosine similarity scores didn't always translate to actual compatibility due to the inability of embeddings to capture dealbreaker preferences. They improved results by incorporating structured features and hard filters.
  6. Sipeed’s MaixCAM2 is a powerful, open-source AI camera designed for makers, offering significant performance improvements over Raspberry Pi and OpenMV solutions. It features the Axera Tech AX630 AI SoC with up to 12.8 TOPS and supports training-free vision models and vision-language models.
  7. This article discusses the choice between Jetson Nano and Raspberry Pi 5 for building a first ROS2 robot, advocating for a Raspberry Pi 5-based kit like the MentorPi M1 to bypass hardware headaches and accelerate learning.
  8. A gentle introduction to Causal Machine Learning, covering the core concepts, differences from traditional ML, and practical applications with Python.
  9. This post introduces **GIST (Greedy Independent Set Thresholding)**, a new algorithm for selecting diverse and useful data subsets for machine learning. GIST tackles the NP-hard problem of balancing diversity (minimizing redundancy) and utility (relevance to the task) in large datasets.

    **Key points:**

    * **Approach:** GIST prioritizes minimum distance between selected data points (diversity) then uses a greedy algorithm to approximate the highest-utility subset within that constraint, testing various distance thresholds.
    * **Guarantee:** GIST is guaranteed to find a subset with at least half the value of the optimal solution.
    * **Performance:** Experiments demonstrate GIST outperforms existing methods (Random, Margin, k-center, Submod) in image classification and single-shot downsampling.
    * **Application:** Already used to improve video recommendation diversity at YouTube.

    **GIST provides a mathematically grounded and efficient solution for selecting high-quality data subsets for machine learning, crucial as datasets scale.**
    .
  10. Zhipu AI has released GLM-4.7-Flash, a 30B-A3B MoE model designed for efficient local coding and agent applications. It offers strong coding and reasoning performance with a 128k token context length and supports English and Chinese.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: machine learning

About - Propulsed by SemanticScuttle